Goto

Collaborating Authors

 execution risk


A Fully Polynomial Time Approximation Scheme for Constrained MDPs and Stochastic Shortest Path under Local Transitions

arXiv.org Artificial Intelligence

The fixed-horizon constrained Markov Decision Process (C-MDP) is a well-known model for planning in stochastic environments under operating constraints. Chance-Constrained MDP (CC-MDP) is a variant that allows bounding the probability of constraint violation, which is desired in many safety-critical applications. CC-MDP can also model a class of MDPs, called Stochastic Shortest Path (SSP), under dead-ends, where there is a trade-off between the probability-to-goal and cost-to-goal. This work studies the structure of (C)C-MDP, particularly an important variant that involves local transition. In this variant, the state reachability exhibits a certain degree of locality and independence from the remaining states. More precisely, the number of states, at a given time, that share some reachable future states is always constant. (C)C-MDP under local transition is NP-Hard even for a planning horizon of two. In this work, we propose a fully polynomial-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies. Such an algorithm is among the best approximation algorithm attainable in theory and gives insights into the approximability of constrained MDP and its variants.


Dual Formulation for Chance Constrained Stochastic Shortest Path with Application to Autonomous Vehicle Behavior Planning

arXiv.org Artificial Intelligence

Autonomous vehicles face the problem of optimizing the expected performance of subsequent maneuvers while bounding the risk of collision with surrounding dynamic obstacles. These obstacles, such as agent vehicles, often exhibit stochastic transitions that should be accounted for in a timely and safe manner. The Constrained Stochastic Shortest Path problem (C-SSP) is a formalism for planning in stochastic environments under certain types of operating constraints. While C-SSP allows specifying constraints in the planning problem, it does not allow for bounding the probability of constraint violation, which is desired in safety-critical applications. This work's first contribution is an exact integer linear programming formulation for Chance-constrained SSP (CC-SSP) that attains deterministic policies. Second, a randomized rounding procedure is presented for stochastic policies. Third, we show that the CC-SSP formalism can be generalized to account for constraints that span through multiple time steps. Evaluation results show the usefulness of our approach in benchmark problems compared to existing approaches.


Applying AI to the war on financial crime

#artificialintelligence

Clouds are gathering over swaths of fintech companies, as falling economic growth, rising interest rates and a cost of living crisis put their business models under strain, forcing job cuts and valuation-crushing funding rounds. ComplyAdvantage founder Charlie Delingpole knows his company is not immune to those forces, as fintechs are among the biggest buyers of his financial crime prevention products. In fact, some clients, including crypto lender Celsius Network, have already gone bust. But the business -- which uses natural language processing and artificial intelligence (AI) to run compliance checks on transactions -- is proving more resilient than most, as Russia-related sanctions and a global clampdown on financial crime underpin healthy demand. "We're the last thing they turn off before their server," says Delingpole, a one-time JPMorgan Chase technology banker, of the enduring demand for his company's services from financial groups -- even when times are tight.


Risk Conditioned Neural Motion Planning

arXiv.org Artificial Intelligence

Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks. While existing mathematical programming methods offer theoretical guarantees in the context of constrained Markov decision processes, they either lack scalability in solving larger problems or produce conservative plans. Recent advances in deep reinforcement learning improve scalability by learning policy networks as function approximators. In this paper, we propose an extension of soft actor critic model to estimate the execution risk of a plan through a risk critic and produce risk-bounded policies efficiently by adding an extra risk term in the loss function of the policy network. We define the execution risk in an accurate form, as opposed to approximating it through a summation of immediate risks at each time step that leads to conservative plans. Our proposed model is conditioned on a continuous spectrum of risk bounds, allowing the user to adjust the risk-averse level of the agent on the fly. Through a set of experiments, we show the advantage of our model in terms of both computational time and plan quality, compared to a state-of-the-art mathematical programming baseline, and validate its performance in more complicated scenarios, including nonlinear dynamics and larger state space.


Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions

arXiv.org Artificial Intelligence

Chance Constrained Markov Decision Processes maximize reward subject to a bounded probability of failure, and have been frequently applied for planning with potentially dangerous outcomes or unknown environments. Solution algorithms have required strong heuristics or have been limited to relatively small problems with up to millions of states, because the optimal action to take from a given state depends on the probability of failure in the rest of the policy, leading to a coupled problem that is difficult to solve. In this paper we examine a generalization of a CCMDP that trades off probability of failure against reward through a functional relationship. We derive a constraint that can be applied to each state history in a policy individually, and which guarantees that the chance constraint will be satisfied. The approach decouples states in the CCMDP, so that large problems can be solved efficiently. We then introduce Vulcan, which uses our constraint in order to apply Monte Carlo Tree Search to CCMDPs. Vulcan can be applied to problems where it is unfeasible to generate the entire state space, and policies must be returned in an anytime manner. We show that Vulcan and its variants run tens to hundreds of times faster than linear programming methods, and over ten times faster than heuristic based methods, all without the need for a heuristic, and returning solutions with a mean suboptimality on the order of a few percent. Finally, we use Vulcan to solve for a chance constrained policy in a CCMDP with over $10^{13}$ states in 3 minutes.


Machine Learning Can Balance Risks Involved In Bitcoin Trading

International Business Times

In terms of the directional movement of bitcoin the currency, 2015 saw near 40% gains making it one of the best performing financial instruments out there. But often traders are seeking greater returns than that and don't necessarily want the directional exposure, but just want to capture bitcoin's volatility. This means trading bitcoin at a higher frequency, balancing transaction costs and execution risk – and this can be facilitated by machine learning. Arshak Navruzyan the founder of Startup.ML, who has been applying machine learning to quantitative finance problems, found that cryptocurrency is also interesting because it allows relatively small scale investors access to exchanges, where they can get full order book data and trade more cost effectively compared to going through a brokerage. Navruzyan said: "This is actually one of the exciting things about cryptocurrency; why a lot of our modelling work is happening in this area is because you do get access to exchanges even as a little guy."


RAO*: An Algorithm for Chance-Constrained POMDP's

AAAI Conferences

Autonomous agents operating in partially observable stochastic environments often face the problem of optimizing expected performance while bounding the risk of violating safety constraints. Such problems can be modeled as chance-constrained POMDP's (CC-POMDP's). Our first contribution is a systematic derivation of execution risk in POMDP domains, which improves upon how chance constraints are handled in the constrained POMDP literature. Second, we present RAO*, a heuristic forward search algorithm producing optimal, deterministic, finite-horizon policies for CC-POMDP's. In addition to the utility heuristic, RAO* leverages an admissible execution risk heuristic to quickly detect and prune overly-risky policy branches. Third, we demonstrate the usefulness of RAO* in two challenging domains of practical interest: power supply restoration and autonomous science agents.